sarima

Modelos SARIMA

Ejercicio

Pronóstico a 4 años usando SARIMA. Se busca el menor MAPE de pronóstico

ARIMA model including seasonality: \((P,D,Q)_m\) where m is the seasonal period. Where \((p,q,d)\) is the not seasonal part of the model.

p: PACF

d: unitroot_ndiffs

q: ACF

P: PACF

Q: unitroot_nsdiffs

D: ACF (seasonal lags)

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(fpp3)
Registered S3 method overwritten by 'tsibble':
  method               from 
  as_tibble.grouped_df dplyr
── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
✔ tsibble     1.1.6     ✔ feasts      0.4.1
✔ tsibbledata 0.4.1     ✔ fable       0.4.1
── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
✖ lubridate::date()    masks base::date()
✖ dplyr::filter()      masks stats::filter()
✖ tsibble::intersect() masks base::intersect()
✖ tsibble::interval()  masks lubridate::interval()
✖ dplyr::lag()         masks stats::lag()
✖ tsibble::setdiff()   masks base::setdiff()
✖ tsibble::union()     masks base::union()
library(tidyquant)
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
── Attaching core tidyquant packages ─────────────────────── tidyquant 1.0.11 ──
✔ PerformanceAnalytics 2.0.8      ✔ TTR                  0.24.4
✔ quantmod             0.4.28     ✔ xts                  0.14.1── Conflicts ────────────────────────────────────────── tidyquant_conflicts() ──
✖ zoo::as.Date()                 masks base::as.Date()
✖ zoo::as.Date.numeric()         masks base::as.Date.numeric()
✖ dplyr::filter()                masks stats::filter()
✖ xts::first()                   masks dplyr::first()
✖ zoo::index()                   masks tsibble::index()
✖ tsibble::interval()            masks lubridate::interval()
✖ dplyr::lag()                   masks stats::lag()
✖ xts::last()                    masks dplyr::last()
✖ PerformanceAnalytics::legend() masks graphics::legend()
✖ quantmod::summary()            masks base::summary()
✖ tidyquant::VAR()               masks fable::VAR()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Attaching package: 'tidyquant'

The following object is masked from 'package:fable':

    VAR
library(plotly)

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
library(ggplot2)
 h02 <- PBS |> 
   filter(ATC2 == "H02") |> 
   summarise(Cost = sum(Cost)/1e6)
 
 h02_train = h02 |> 
   filter_index(. ~ "2004 Jun.")
 
 h02_train |> 
   autoplot(Cost)

Se requiere estabilizar la varianza, se probará con logaritmos:

h02_train |> 
   autoplot(log(Cost))

Ahora falta estabilizar la media para hacer la serie estacionaria:

h02_train |> 
  features(log(Cost), unitroot_nsdiffs)
# A tibble: 1 × 1
  nsdiffs
    <int>
1       1

Aplicando las diferencias estacionales:

h02_train |> 
  autoplot(log(Cost) |> difference(lag = 12)) #la serie es mensual, m=12

h02_train |> 
  features(log(Cost) |> difference(lag = 12), unitroot_ndiffs)
# A tibble: 1 × 1
  ndiffs
   <int>
1      1

La serie además de la diferencia estacional, necesita 1 diferencia no estacional.

h02_train |> 
  gg_tsdisplay(log(Cost) |> difference(lag = 12) |> difference(), plot_type = "partial", lag_max = 48)

p: {1,2} (pico en rezago 1 PACF)

d: 1

q: {1 pico significativo en rezago 1 y otro en 2 pero el más significativo es 1

P: 0 porque el primero no es significativo

D: 1

Q: 0 porque en el periodo 12 no hay un corte significativo

Como m es 12, los periodos significativos se revisan cada 12

Además del modelo logarítmico, se probará la transformación matemática con box-cox para ver si hay diferencias significativas en los modelos:

h02_lambda <- h02_train |> 
  features(Cost, features= guerrero) |> 
  pull()

h02_lambda
[1] 0.2522755
h02_fit <- h02_train |> 
  model(
    semi_auto_log = ARIMA(log(Cost) ~ pdq(p = 1:2,1,0:1) + PDQ(0,1,0)),
    semi_auto_bx  = ARIMA(box_cox(Cost, h02_lambda) ~ pdq(p = 1:2,1,0:1) + PDQ(0,1,0)),
    arima_log_1 = ARIMA(log(Cost) ~ pdq(p = 1,1,1) + PDQ(0,1,0)),
    arima_bx_1  = ARIMA(box_cox(Cost, h02_lambda) ~ pdq(p = 1,1,1) + PDQ(0,1,0)),
  )

h02_fit |> glance() |> 
  arrange(AICc)
# A tibble: 4 × 8
  .model         sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
  <chr>           <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
1 semi_auto_bx  0.00350    202. -398. -398. -389. <cpl [2]> <cpl [0]>
2 arima_bx_1    0.00362    200. -393. -393. -385. <cpl [1]> <cpl [1]>
3 semi_auto_log 0.00406    191. -377. -377. -368. <cpl [2]> <cpl [0]>
4 arima_log_1   0.00420    189. -372. -372. -363. <cpl [1]> <cpl [1]>
h02_fit |> 
  accuracy() |> 
  arrange(MAPE)
# A tibble: 4 × 10
  .model        .type          ME   RMSE    MAE    MPE  MAPE  MASE RMSSE    ACF1
  <chr>         <chr>       <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>   <dbl>
1 semi_auto_bx  Training -9.93e-4 0.0474 0.0329 -0.219  4.41 0.564 0.673 -0.0244
2 semi_auto_log Training -1.65e-3 0.0479 0.0331 -0.265  4.42 0.567 0.680 -0.0247
3 arima_bx_1    Training -8.09e-4 0.0489 0.0342 -0.239  4.54 0.585 0.694 -0.0487
4 arima_log_1   Training -1.48e-3 0.0496 0.0345 -0.282  4.57 0.592 0.703 -0.0506
h02_forecasts <- h02_fit |> forecast(h = "4 years")

p <- h02_forecasts |> autoplot(h02) +
  labs(title = "SARIMA",
       x = "Month", y = "Cost") +
    theme_minimal()

plotly::ggplotly(p)
Warning in geom2trace.default(dots[[1L]][[4L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_() has yet to be implemented in plotly.
  If you'd like to see this geom implemented,
  Please open an issue with your example code at
  https://github.com/ropensci/plotly/issues
Warning in geom2trace.default(dots[[1L]][[4L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_() has yet to be implemented in plotly.
  If you'd like to see this geom implemented,
  Please open an issue with your example code at
  https://github.com/ropensci/plotly/issues
Warning in geom2trace.default(dots[[1L]][[4L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_() has yet to be implemented in plotly.
  If you'd like to see this geom implemented,
  Please open an issue with your example code at
  https://github.com/ropensci/plotly/issues
Warning in geom2trace.default(dots[[1L]][[4L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_() has yet to be implemented in plotly.
  If you'd like to see this geom implemented,
  Please open an issue with your example code at
  https://github.com/ropensci/plotly/issues
h02_forecasts <- h02_fit |> forecast(h = "4 years")

h02_forecasts |> autoplot(h02) +
  labs(title = "SARIMA",
       x = "Month", y = "log(Cost)") +
    theme_minimal()

De acuerdo con las métricas el mejor modelo es semi_auto_bx:

h02_forecasts |> 
  filter(.model == "semi_auto_bx") |> 
  autoplot()

h02_fit |> 
  select(semi_auto_bx) |>
  report()
Series: Cost 
Model: ARIMA(2,1,0)(0,1,0)[12] 
Transformation: box_cox(Cost, h02_lambda) 

Coefficients:
          ar1      ar2
      -0.7431  -0.3907
s.e.   0.0769   0.0766

sigma^2 estimated as 0.003505:  log likelihood=202
AIC=-398   AICc=-397.83   BIC=-389.12

El modelo que muestra la mejor métrica de error es el semi_auto_bx, utilizando la transformación matemática de box-cox para estabilizar la varianza y un modelo \(ARIMA(2,1,0)(0,1,0)_{12}\) . Con un MAPE de 4.412171% de error de pronóstico.